PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets

نویسندگان

  • Lefteris Moussiades
  • Athena Vakali
چکیده

Efficient detection of plagiarism in programming assignments of students is of a great importance to the educational procedure. This paper presents a clustering oriented approach for facing the problem of source code plagiarism. The implemented software, called PDetect, accepts as input a set of program sources and extracts subsets (the clusters of plagiarism) such that each program within a particular subset has been derived from the same original. PDetect proposes the use of an appropriate measure for evaluating plagiarism detection performance and supports the idea of combining different plagiarism detection schemes. Furthermore, a cluster analysis is performed in order to provide information beneficial to the plagiarism detection process. PDetect is designed such that it may be easily adapted over any keyword-based programming language and it is quite beneficial when compared with earlier (state-of-the-art) plagiarism detection approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Abstract Method Linearization for Detecting Source Code Plagiarism in Object-Oriented Environment

Despite the fact that plagiarizing source code is a trivial task for most CS students, detecting such unethical behavior requires a considerable amount of effort. Thus, several plagiarism detection systems were developed to handle such issue. This paper extends Karnalim’s work, a low-level approach for detecting Java source code plagiarism, by incorporating abstract method linearization. Such e...

متن کامل

Source Code Plagiarism in Computer Engineering Courses

In today’s university life, teachers are often confronted with plagiarism. A special form of plagiarism is source code plagiarism typically found in programming courses at universities and schools. Detecting or even preventing source code plagiarism is by no means a trivial task. Therefore, this paper explains and discusses different methods that can be used to prevent and detect source code pl...

متن کامل

A Comparison of Similarity Techniques for Detecting Source Code Plagiarism

Academic dishonesty is a universal problem. Detecting duplicated text among natural language artifacts is a welldocumented task. However, performing similar analysis on source code presents unique problems. In this paper, I present a comparison of the application of various techniques in textual similarity processing on source code. Beyond this, I investigate the application of textual similari...

متن کامل

Detecting Disguised Plagiarism

Source code plagiarism detection is a problem that has been addressed several times before; and several tools have been developed for that purpose. In this research project we investigated a set of possible disguises that can be mechanically applied to plagiarized source code to defeat plagiarism detection tools. We propose a preprocessor to be used with existing plagiarism detection tools to "...

متن کامل

COAT: Code ObfuscAtion Tool to evaluate the performance of code plagiarism detection tools

There exist many plagiarism detection tools to uncover plagiarized codes by analyzing the similarity of source codes. To measure how reliable those plagiarism detection tools are, we developed a tool named Code ObfuscAtion Tool (COAT) that takes a program source code as input and produces another source code that is exactly equivalent to the input source code in their functional behaviors but w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. J.

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2005